Search CORE

30 research outputs found

On-stack replacement, distilled

Author: Bhandari Abhilash
Copperman Max
Earl
Evans Thomas G.
Hayden Christopher M.
Henning John L.
Jens Palsberg Guo
Magill Stephen
Nurudeen
Ottenstein Karl J.
Paleczny Michael
Pnueli Amir
Subramanian Suriya
Wang Kunshan
Wu Le-Chun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

On-stack replacement (OSR) is essential technology for adaptive optimization, allowing changes to code actively executing in a managed runtime. The engineering aspects of OSR are well-known among VM architects, with several implementations available to date. However, OSR is yet to be explored as a general means to transfer execution between related program versions, which can pave the road to unprecedented applications that stretch beyond VMs. We aim at filling this gap with a constructive and provably correct OSR framework, allowing a class of general-purpose transformation functions to yield a special-purpose replacement. We describe and evaluate an implementation of our technique in LLVM. As a novel application of OSR, we present a feasibility study on debugging of optimized code, showing how our techniques can be used to fix variables holding incorrect values at breakpoints due to optimizations

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Write-rationing garbage collection for hybrid memories

Author: Akram Shoaib
Blackburn Stephen M.
Blackburn Stephen M.
Bois Kristof Du
Burr Geoffrey W.
Frampton Daniel
Ha Jungwoo
Hicks Michael
Huang Xianglong
Jantz Michael R.
Jones Richard
Kyrola Aapo
Lee Benjamin C.
Li Sheng
Lim Kevin
Nguyen Khanh
Paleczny Michael
Qureshi Moinuddin K.
Qureshi Moinuddin K.
Sartor Jennifer B.
Shahriyar Rifat
Stephen
Wang Chenxi
Yang Xi
Zhang Lunkai
Zhao Yi
Zhou Yuanyuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Emerging Non-Volatile Memory (NVM) technologies offer high capacity and energy efficiency compared to DRAM, but suffer from limited write endurance and longer latencies. Prior work seeks the best of both technologies by combining DRAM and NVM in hybrid memories to attain low latency, high capacity, energy efficiency, and durability. Coarse-grained hardware and OS optimizations then spread writes out (wear-leveling) and place highly mutated pages in DRAM to extend NVM lifetimes. Unfortunately even with these coarse-grained methods, popular Java applications exact impractical NVM lifetimes of 4 years or less. This paper shows how to make hybrid memories practical, without changing the programming model, by enhancing garbage collection in managed language runtimes. We find object write behaviors offer two opportunities: (1) 70% of writes occur to newly allocated objects, and (2) 2% of objects capture 81% of writes to mature objects. We introduce writerationing garbage collectors that exploit these fine-grained behaviors. They extend NVM lifetimes by placing highly mutated objects in DRAM and read-mostly objects in NVM. We implement two such systems. (1) Kingsguard-nursery places new allocation in DRAM and survivors in NVM, reducing NVM writes by 5x versus NVM only with wear-leveling. (2) Kingsguard-writers (KG-W) places nursery objects in DRAM and survivors in a DRAM observer space. It monitors all mature object writes and moves unwritten mature objects from DRAM to NVM. Because most mature objects are unwritten, KG-W exploits NVM capacity while increasing NVM lifetimes by 11x. It reduces the energy-delay product by 32% over DRAM-only and 29% over NVM-only. This work opens up new avenues for making hybrid memories practical

Crossref

Ghent University Academic Bibliography

A Simple Graph-Based Intermediate Representation

Author: Cliff Click
Michael Paleczny
Publication venue
Publication date: 01/01/1995
Field of study

We present a graph-based intermediate representation (IR) with simple semantics and a low-memory-cost C++ implementation. The IR uses a directed graph with labeled vertices and ordered inputs but unordered outputs. Vertices are labeled with opcodes, edges are unlabeled. We represent the CFG and basic blocks with the same vertex and edge structures. Each opcode is defined by a C++ class that encapsulates opcode-specific data and behavior. We use inheritance to abstract common opcode behavior, allowing new opcodes to be easily defined from old ones. The resulting IR is simple, fast and easy to use. 1. Introduction Intermediate representations do not exist in a vacuum. They are the stepping stone from what the programmer wrote to what the machine understands. Intermediate representations must bridge a large semantic gap (for example, from FORTRAN 90 vector operations to a 3-address add in some machine code). During the translation from a high-level language to machine code, an optimizin..

CiteSeerX

Crossref

Compiler Support for Out-of-Core Arrays on Parallel Machines

Author: Charles Koelbel
Charles Koelbel
Ken Kennedy
Ken Kennedy
Michael Paleczny
Michael Paleczny
Publication venue: IEEE Computer Society Press
Publication date
Field of study

Many computational methods are currently limited by the size of physical memory, the latency of disk storage, and the difficulty of writing an efficient outof -core version of the application. We are investigating a compiler-based approach to the above problem. In general, our compiler techniques attempt to choreograph I/O for an application based on high-level programmer annotations similar to Fortran D's DECOMPOSITION, ALIGN, and DISTRIBUTE statements. The central problem is to generate "deferred routines" which delay computations until all the data they require have been read into main memory. We present the results for two applications, LU factorization and red-black relaxation, on 1 to 32 nodes of an Intel Paragon after hand application of these compiler techniques. 1 Introduction Improvements in processor performance have outpaced developments in both memory and disk I/O speed. As a result, out-of-core applications, which require significantly more data than will fit into RAM,..

CiteSeerX

Compiler Support for Out-of-Core Arrays on Parallel Machines

Author: Charles Koelbel
Ken Kennedy
Michael Paleczny
Publication venue: IEEE Computer Society Press
Publication date
Field of study

CiteSeerX

A Model and Compilation Strategy for Out-of-Core Data Parallel Programs

Author: Alok Choudhary
Alok Choudhary
Charles Koelbel
Charles Koelbel
Ken Kennedy
Ken Kennedy
Michael Paleczny
Michael Paleczny
Rajesh Bordawekar
Rajesh Bordawekar Alok
Publication venue: ACM Press
Publication date: 01/01/1995
Field of study

It is widely acknowledged in high-performance computing circles that parallel input/output needs substantial improvement in order to make scalable computers truly usable. We present a data storage model that allows processors independent access to their own data and a corresponding compilation strategy that integrates data-parallel computation with data distribution for out-of-core problems. Our results compare several communication methods and I/O optimizations using two out-of-core problems, Jacobi iteration and LU factorization. A Model and Compilation Strategy for Out-of-Core Data Parallel Programs Rajesh Bordawekar Alok Choudhary Ken Kennedy y Charles Koelbel y Michael Paleczny y 1 Introduction There can be no argument that high-performance I/O is essential to high-performance computing. Many users see parallelism as the best way to achieve this performance, thus motivating the calls for "parallel I/O." Informal surveys of users of high-performance computing have ..

CiteSeerX